Goto

Collaborating Authors

 data silo


Accelerating drug discovery with Artificial: a whole-lab orchestration and scheduling system for self-driving labs

Fehlis, Yao, Mandel, Paul, Crain, Charles, Liu, Betty, Fuller, David

arXiv.org Artificial Intelligence

Accelerating drug discovery with Artificial: a whole-lab orchestration and scheduling system for self-driving labs Y ao Fehlis, Paul Mandel, Charles Crain, Betty Liu, David Fuller a a Artificial Inc.,Abstract Self-driving labs are transforming drug discovery by enabling automated, AI-guided experimentation, but they face challenges in orchestrating complex workflows, integrating diverse instruments and AI models, and managing data e fficiently. Artificial addresses these issues with a comprehensive orchestration and scheduling system that unifies lab operations, automates workflows, and integrates AI-driven decision-making. By incorporating AI / ML models like NVIDIA BioNeMo--which facilitates molecular interaction prediction and biomolecular analysis--Artificial enhances drug discovery and accelerates data-driven research. Through real-time coordination of instruments, robots, and personnel, the platform streamlines experiments, enhances reproducibility, and advances drug discovery. Introduction The landscape of drug discovery has long been characterized by a multitude of challenges, including the high costs of research and development, lengthy timelines, and a significant rate of failure during clinical trials (Blanco-Gonzalez et al., 2023; Udegbe et al., 2024; Khanna, 2012; Mo ffat et al., 2017).


Contrastive Federated Learning with Tabular Data Silos

Ginanjar, Achmad, Li, Xue, Hua, Wen

arXiv.org Artificial Intelligence

Learning from data silos is a difficult task for organizations that need to obtain knowledge of objects that appeared in multiple independent data silos. Objects in multi-organizations, such as government agents, are referred by different identifiers, such as driver license, passport number, and tax file number. The data distributions in data silos are mostly non-IID (Independently and Identically Distributed), labelless, and vertically partitioned (i.e., having different attributes). Privacy concerns harden the above issues. Conditions inhibit enthusiasm for collaborative work. While Federated Learning (FL) has been proposed to address these issues, the difficulty of labeling, namely, label costliness, often hinders optimal model performance. A potential solution lies in contrastive learning, an unsupervised self-learning technique to represent semantic data by contrasting similar data pairs. However, contrastive learning is currently not designed to handle tabular data silos that existed within multiple organizations where data linkage by quasi identifiers are needed. To address these challenges, we propose using semi-supervised contrastive federated learning, which we refer to as Contrastive Federated Learning with Data Silos (CFL). Our approach tackles the aforementioned issues with an integrated solution. Our experimental results demonstrate that CFL outperforms current methods in addressing these challenges and providing improvements in accuracy. Additionally, we present positive results that showcase the advantages of our contrastive federated learning approach in complex client environments.


On Vessel Location Forecasting and the Effect of Federated Learning

Tritsarolis, Andreas, Pelekis, Nikos, Bereta, Konstantina, Zissis, Dimitris, Theodoridis, Yannis

arXiv.org Artificial Intelligence

The wide spread of Automatic Identification System (AIS) has motivated several maritime analytics operations. Vessel Location Forecasting (VLF) is one of the most critical operations for maritime awareness. However, accurate VLF is a challenging problem due to the complexity and dynamic nature of maritime traffic conditions. Furthermore, as privacy concerns and restrictions have grown, training data has become increasingly fragmented, resulting in dispersed databases of several isolated data silos among different organizations, which in turn decreases the quality of learning models. In this paper, we propose an efficient VLF solution based on LSTM neural networks, in two variants, namely Nautilus and FedNautilus for the centralized and the federated learning approach, respectively. We also demonstrate the superiority of the centralized approach with respect to current state of the art and discuss the advantages and disadvantages of the federated against the centralized approach.


Uncertainty-Based Extensible Codebook for Discrete Federated Learning in Heterogeneous Data Silos

Zhang, Tianyi, Cao, Yu, Liu, Dianbo

arXiv.org Artificial Intelligence

Federated learning (FL), aimed at leveraging vast distributed datasets, confronts a crucial challenge: the heterogeneity of data across different silos. While previous studies have explored discrete representations to enhance model generalization across minor distributional shifts, these approaches often struggle to adapt to new data silos with significantly divergent distributions. In response, we have identified that models derived from FL exhibit markedly increased uncertainty when applied to data silos with unfamiliar distributions. Consequently, we propose an innovative yet straightforward iterative framework, termed Uncertainty-Based Extensible-Codebook Federated Learning (UEFL). This framework dynamically maps latent features to trainable discrete vectors, assesses the uncertainty, and specifically extends the discretization dictionary or codebook for silos exhibiting high uncertainty. Our approach aims to simultaneously enhance accuracy and reduce uncertainty by explicitly addressing the diversity of data distributions, all while maintaining minimal computational overhead in environments characterized by heterogeneous data silos. Through experiments conducted on five datasets, our method has demonstrated its superiority, achieving significant improvements in accuracy (by 3%--22.1%) and uncertainty reduction (by 38.83%--96.24%), thereby outperforming contemporary state-of-the-art methods. The source code is available at https://github.com/destiny301/uefl.


Adaptive Distributed Kernel Ridge Regression: A Feasible Distributed Learning Scheme for Data Silos

Wang, Di, Liu, Xiaotong, Lin, Shao-Bo, Zhou, Ding-Xuan

arXiv.org Machine Learning

Data silos, mainly caused by privacy and interoperability, significantly constrain collaborations among different organizations with similar data for the same purpose. Distributed learning based on divide-and-conquer provides a promising way to settle the data silos, but it suffers from several challenges, including autonomy, privacy guarantees, and the necessity of collaborations. This paper focuses on developing an adaptive distributed kernel ridge regression (AdaDKRR) by taking autonomy in parameter selection, privacy in communicating non-sensitive information, and the necessity of collaborations in performance improvement into account. We provide both solid theoretical verification and comprehensive experiments for AdaDKRR to demonstrate its feasibility and effectiveness. Theoretically, we prove that under some mild conditions, AdaDKRR performs similarly to running the optimal learning algorithms on the whole data, verifying the necessity of collaborations and showing that no other distributed learning scheme can essentially beat AdaDKRR under the same conditions. Numerically, we test AdaDKRR on both toy simulations and two real-world applications to show that AdaDKRR is superior to other existing distributed learning schemes. All these results show that AdaDKRR is a feasible scheme to defend against data silos, which are highly desired in numerous application regions such as intelligent decision-making, pricing forecasting, and performance prediction for products.


The Role of Cross-Silo Federated Learning in Facilitating Data Sharing in the Agri-Food Sector

Durrant, Aiden, Markovic, Milan, Matthews, David, May, David, Enright, Jessica, Leontidis, Georgios

arXiv.org Artificial Intelligence

Data sharing remains a major hindering factor when it comes to adopting emerging AI technologies in general, but particularly in the agri-food sector. Protectiveness of data is natural in this setting; data is a precious commodity for data owners, which if used properly can provide them with useful insights on operations and processes leading to a competitive advantage. Unfortunately, novel AI technologies often require large amounts of training data in order to perform well, something that in many scenarios is unrealistic. However, recent machine learning advances, e.g. federated learning and privacy-preserving technologies, can offer a solution to this issue via providing the infrastructure and underpinning technologies needed to use data from various sources to train models without ever sharing the raw data themselves. In this paper, we propose a technical solution based on federated learning that uses decentralized data, (i.e. data that are not exchanged or shared but remain with the owners) to develop a cross-silo machine learning model that facilitates data sharing across supply chains. We focus our data sharing proposition on improving production optimization through soybean yield prediction, and provide potential use-cases that such methods can assist in other problem settings. Our results demonstrate that our approach not only performs better than each of the models trained on an individual data source, but also that data sharing in the agri-food sector can be enabled via alternatives to data exchange, whilst also helping to adopt emerging machine learning technologies to boost productivity.


Federated Alternate Training (FAT): Leveraging Unannotated Data Silos in Federated Segmentation for Medical Imaging

Mushtaq, Erum, Bakman, Yavuz Faruk, Ding, Jie, Avestimehr, Salman

arXiv.org Artificial Intelligence

Federated Learning (FL) aims to train a machine learning (ML) model in a distributed fashion to strengthen data privacy with limited data migration costs. It is a distributed learning framework naturally suitable for privacy-sensitive medical imaging datasets. However, most current FL-based medical imaging works assume silos have ground truth labels for training. In practice, label acquisition in the medical field is challenging as it often requires extensive labor and time costs. To address this challenge and leverage the unannotated data silos to improve modeling, we propose an alternate training-based framework, Federated Alternate Training (FAT), that alters training between annotated data silos and unannotated data silos. Annotated data silos exploit annotations to learn a reasonable global segmentation model. Meanwhile, unannotated data silos use the global segmentation model as a target model to generate pseudo labels for self-supervised learning. We evaluate the performance of the proposed framework on two naturally partitioned Federated datasets, KiTS19 and FeTS2021, and show its promising performance.


Integration remains key challenge for digital transformation

#artificialintelligence

It's a business pain point most know only too well, and new research confirms that integration challenges are not just a pain, they're slowing companies' digital ambitions and causing infrastructure issues and risks. MuleSoft's eighth annual Connectivity Benchmark Report shows the number of applications in Australian organisations (sorry, New Zealand, there are no Kiwi results in this one) have increased nearly 10 percent in the past year, to 1,032, highlighting the complexity of the digital landscape. But 68 percent of those applications are not integrated with other applications used by the business, creating data silos and the flow on effects, including increased costs, duplicated work, productivity bottlenecks and disconnected experiences. It's a situation that's proving costly – not just in terms of money spent building custom integrations (read on for those eye-watering figures) but also in the slowing of digital transformation efforts – something 84 percent of Australians said was happening, causing infrastructure and major risks as IT budgets come under increased scrutiny. And the cost of failing to complete digital transformation initiatives successfully?


Are Data Silos Undermining Digital Transformation? - ReadWrite

#artificialintelligence

At a time of seemingly ultrarapid digital disruptions, digital transformation in an enterprise needs a bold vision and an intent to embrace change. With the global digital transformation market projected to reach $2.8 trillion in 2025, leaders are expediting their transition to digital across their organizations. And as enterprises course-correct and adapt to specific strategies along this journey, they need a sound understanding of their data to drive informed decisions. The needed understanding of data-informed decisions is because high-quality data is at the heart of all digitalization initiatives, from delivering invaluable insights to and uncovering latent operational efficiency strategies. And that's the reason organizations' must get careful about the creation of data silos. Today 73.5% of most leading companies are data-driven in their decision-making.


La veille de la cybersécurité

#artificialintelligence

Artificial intelligence just doesn't pop up when you install tools and software. It takes planning and, most of all, it takes data. But getting the right data to make AI and machine learning algorithms -- and understanding it -- is where many organizations are slipping up, a recent study finds. Organizations face difficulties with data silos, explainability, and transparency, a study of 150 data executives commissioned by Capital One and Forrester Consulting finds. They say internal, cross-organizational, and external data silos slowed machine learning deployments and outcomes.